Improved Word Alignment with Statistics and Linguistic Heuristics

نویسنده

Ulf Hermjakob

چکیده

We present a method to align words in a bitext that combines elements of a traditional statistical approach with linguistic knowledge. We demonstrate this approach for Arabic-English, using an alignment lexicon produced by a statistical word aligner, as well as linguistic resources ranging from an English parser to heuristic alignment rules for function words. These linguistic heuristics have been generalized from a development corpus of 100 parallel sentences. Our aligner, UALIGN, outperforms both the commonly used GIZA++ aligner and the state-of-theart LEAF aligner on F-measure and produces superior scores in end-to-end statistical machine translation, +1.3 BLEU points over GIZA++, and +0.7 over LEAF.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Heuristics in Word Alignment

The IBM statistical machine translation (SMT) models [Brown et al.1993] have been extremely influential in computational linguistics in the past decade. The (arguably) most striking characteristic of the IBM-style SMT models is their total lack of inherent linguistic knowledge. The IBM models demonstrated how much one can do with pure statistical techniques. This has inspired a whole new genera...

متن کامل

Guiding Statistical Word Alignment Models With Prior Knowledge

We present a general framework to incorporate prior knowledge such as heuristics or linguistic features in statistical generative word alignment models. Prior knowledge plays a role of probabilistic soft constraints between bilingual word pairs that shall be used to guide word alignment model training. We investigate knowledge that can be derived automatically from entropy principle and bilingu...

متن کامل

Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuri...

متن کامل

A Maximum Entropy Approach to Combining Word Alignments

This paper presents a new approach to combining outputs of existing word alignment systems. Each alignment link is represented with a set of feature functions extracted from linguistic features and input alignments. These features are used as the basis of alignment decisions made by a maximum entropy approach. The learning method has been evaluated on three language pairs, yielding significant ...

متن کامل

Title of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT

Title of dissertation: COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT Necip Fazıl Ayan, Doctor of Philosophy, 2005 Dissertation directed by: Professor Bonnie J. Dorr Department of Computer Science Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Improved Word Alignment with Statistics and Linguistic Heuristics

نویسنده

چکیده

منابع مشابه

Linguistic Heuristics in Word Alignment

Guiding Statistical Word Alignment Models With Prior Knowledge

Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

A Maximum Entropy Approach to Combining Word Alignments

Title of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT

عنوان ژورنال:

اشتراک گذاری